Language Model for Cyrillic Mongolian to Traditional Mongolian Conversion

نویسندگان

  • Feilong Bao
  • Guanglai Gao
  • Xueliang Yan
  • Hongwei Wang
چکیده

Traditional Mongolian and Cyrillic Mongolian are both Mongolian languages that are respectively used in china and Mongolia. With similar oral pronunciation, their writing forms are totally different. A large part of Cyrillic Mongolian words have more than one corresponds in Traditional Mongolian. This makes the conversion from Cyrillic Mongolian to Traditional Mongolian a hard problem. To overcome this difficulty, this paper proposed a Language model based approach, which takes the advantage of context information. Experimental results show that, for Cyrillic Mongolian words that have multiple correspondence in Traditional Mongolian, the correct rate of this approach reaches 87.66%, thereby greatly improve the overall system performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieval in Texts with Traditional Mongolian Script Realizing Unicoded Traditional Mongolian Digital Library

This paper discusses our approaches to create a digital library on traditional Mongolian script using Unicode. Also we introduce system architecture of a digital library that stores books and materials of historical importance written in traditional Mongolian which contain history of 1,000 years and are important part of Mongolian culture. Specifically, we propose a technique that will allow us...

متن کامل

Cyrillic Mongolian Named Entity Recognition with Rich Features

In this paper, we first create a Cyrillic Mongolian named entity manually annotated corpus. The annotation types contain person names, location names, organization names and other proper names. Then, we use Condition Random Field as classifier and design few categories features of Mongolian, including orthographic feature, morphological feature, gazetteer feature, syllable feature, word cluster...

متن کامل

Extracting Loanwords from Mongolian Corpora and Producing a Japanese-Mongolian Bilingual Dictionary

This paper proposes methods for extracting loanwords from Cyrillic Mongolian corpora and producing a Japanese–Mongolian bilingual dictionary. We extract loanwords from Mongolian corpora using our own handcrafted rules. To complement the rule-based extraction, we also extract words in Mongolian corpora that are phonetically similar to Japanese Katakana words as loanwords. In addition, we corresp...

متن کامل

A Study of Traditional Mongolian Script Encodings and Rendering: Use of Unicode in OpenType fonts

This article discusses the rendering issues of complex text layouts, particularly traditional Mongolian script. Some standards such as Unicode and OpenType format have been implemented and are supported widely. Traditional Mongolian script has been standardized in Unicode. We analyzed existing OpenType fonts and their rendering schemes for traditional Mongolian script. We found some errors, and...

متن کامل

A Novel Approach to Improve the Mongolian Language Model Using Intermediate Characters

In Mongolian language, there is a phenomenon that many words have the same presentation form but represent different words with different codes. Since typists usually input the words according to their representation forms and cannot distinguish the codes sometimes, there are lots of coding errors occurred in Mongolian corpus. It results in statistic and retrieval very difficult on such a Mongo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013